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Abstract 

We propose a memory abstraction able to lift existing nu- 
merical static analyses to C programs containing union 
types, pointer casts, and arbitrary pointer arithmetics. Our 
framework is that of a combined points-to and data-value 
analysis. We abstract the contents of compound variables in 
a field-sensitive way, whether these fields contain numeric or 
pointer values, and use stock numerical abstract domains to 
find an overapproximation of all possible memory states — 
with the ability to discover relationships between variables. 
A main novelty of our approach is the dynamic mapping 
scheme we use to associate a flat collection of abstract cells 
of scalar type to the set of accessed memory locations, while 
taking care of byte- level aliases — i.e., C variables with in- 
compatible types allocated in overlapping memory locations. 
We do not rely on static type information which can be mis- 
leading in C programs as it does not account for all the uses 
a memory zone may be put to. 

Our work was incorporated within the Astree static 
analyzer that checks for the absence of run-time-errors in 
embedded, safety-critical, numerical-intensive software. It 
replaces the former memory domain limited to well-typed, 
union-free, pointer-cast free data-structures. Early results 
demonstrate that this abstraction allows analyzing a larger 
class of C programs, without much cost overhead. 

Categories and Subject Descriptors D.2.4 [Software 
Engineering]: Software/Program Verification — Assertion 
checkers, Formal methods, Validation; D.3.1 [Programming 
Languages]: Formal Definitions and Theory — Semantics; 
F.3.f [Logics and Meanings of Programs]: Specifying and 
Verifying and Reasoning about Programs — Assertions, In- 
variants, Mechanical verification; F.3.2 [Logics and Mean- 
ings of Programs] : Semantics of Programming Languages — 
Program analysis 

General Terms Reliability, Experimentation, Languages, 
Theory, Verification 

Keywords Abstract Interpretation, Points-to Analysis, 
Numerical Analysis, Critical Software 
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1. Introduction 

In embedded critical software, the slightest programming er- 
ror can have the most disastrous consequences. Even when 
the high-level specification of a software is correct, its ac- 
tual implementation using efficient but unsafe low-level lan- 
guages can introduce new kinds of bugs, such as run-time 
errors triggered by integer wrap-around or invalid floating- 
point operations — witness the demise of the Ariane launcher 
in 1996 [8]. Hence, there is a demand for tools able to check 
for potential run-time errors in low-level programs, in an 
automatic and sound way. To this end, we focus here on de- 
riving the set of values the variables of a C program can take 
during all its executions. We allow sound but generally in- 
complete approximations to ensure an efficient analysis. This 
way, we are able to report a set of alarms that encompasses 
all possible run-time error situations. Hopefully, when the 
analysis is sufficiently precise, there are zero alarms which 
actually proves formally the absence of run-time errors. 

Unfortunately, the weak type system of the C program- 
ming language complicates value analysis greatly. In the 
presence of union types, pointer arithmetics or pointer casts, 
the same sequence of memory bytes can be manipulated as 
values of distinct types. Most existing analyses avoid the 
problem by either restricting the input language or by being 
overly conservative about the contents of memory locations 
that can be accessed with incompatible types — i.e., treat 
them in a field- insensitive way. We found these solutions to 
be insufficient to analyze actual embedded C codes provided 
by industrial end-users. As they exploit some knowledge of 
the bit-representation of values and the low-level semantics 
of operators, tracking precisely the manipulated values is 
required to prove the absence of run-time errors. 

To address these problems, we propose a field-sensitive 
value analysis for C programs containing union types, 
pointer arithmetics and pointer casts. Our main contribu- 
tion is an abstraction that maps the memory, viewed as 
untyped spans of bytes, to a collection of synthetic cells 
with integer or floating-point type. We then rely on existing 
alias-unaware numerical analyses — such as intervals [5] or 
octagons [T5] — to infer numerical invariants on cells. Our 
abstraction translates operations on byte-based memory lo- 
cations into operations on cells, taking care of byte-level 
aliasing between cells. We use a dynamic mapping because, 
due to pointer casts, the uses of the memory cannot be 
deduced from the static type information only. The sound- 
ness of our approach is proved in the Abstract Interpreta- 
tion framework. We first construct a non-standard concrete 
semantics that gives a formal meaning to unclean C con- 
structs. Then, we abstract it to derive a static analysis that 
is sound by construction. This makes our design modular — 



it can be used with any underlying numerical domain, even 
a relational one such as octagons — and extensible — new ab- 
stractions based on the same concrete semantics can be 
designed. The abstraction is currently limited to programs 
without unbounded dynamic memory allocation or recur- 
sively. We deliberately left out these features as they are 
generally forbidden in critical software. Our work was in- 
tegrated within the Astree analyzer [3] that checks for 
run-time errors in embedded critical C code and provides 
tight variable bounds, in a few hours of computation time. 

Overview of the Paper. Sect. [2] motivates our work by 
presenting a few realistic code examples involving union 
types and complex pointer arithmetics; they cannot be an- 
alyzed soundly without considering byte-level aliasing. In 
Sect. [3l we present our solution to this problem in an in- 
tuitive way. Sect. U then formalizes our approach in the 
Abstract Interpretation framework. Sect. [5] presents prelim- 
inary experimental results obtained with the Astree ana- 
lyzer. Sect.[6]presents related work. Finally, Sect.[7]discusses 
future work and Sect. [8] concludes. 

2. The Need for a New Memory Domain 

The simplest framework, when performing a value- analysis, 
is to consider programs with a statically known set of vari- 
ables, each having a scalar type: real (i.e., integer or floating- 
point) or pointer type. Such analyses can be lifted to cope 
with variables of aggregate type — arrays and structures — 
by decomposing them into collections of independent cells 
of scalar type. Much literature has been devoted to the 
problems of abstracting numerical invariants, performing 
pointer analysis, or summarizing aggregate variables into 
fewer cells — to cope with large arrays or dynamic memory 
allocation. We are concerned here with the case where the 
basis hypothesis of these works fail: the memory cannot be 
decomposed a priori into a set of independent cells. This 
happens in a language such as C that permits very low-level 
accesses to the memory and the bit-representation of data. 

Union Types. Union types declare fields that, unlike ag- 
gregate types, share the same memory locations. As a con- 
sequence, access paths to cells may be aliased. Consider 
Fig- Q] implementing message objects using a dynamic type 
tag type. In the process function, m->T.type, m->A.type 
and m->B.type all refer to the same cell containing an int. 
It is perfectly legal to modify the cell using one access path 
and read back its contents using another one [141 §6.5.2.3.5]. 
This kind of aliasing is quite benign as it does not prevent us 
from viewing the memory as a collection of distinct cells — 
e.g., using offsets instead of access paths to denote cells. 

A programmer may, however, disregard the value of type, 
write into m->A.a[0] and read back m->B.x, thus mixing 
access paths referring to (partially) overlapping memory 
locations of different types. Although such mixing is strongly 
discouraged by the C norm Q3] and relies on unportable 
assumptions on structure layouts and value encodings, it is 
surprisingly often performed by programmers. Consider the 
variable regs modeling, in Fig. [4] the register state of an 
Intel 8086 processor. It is expected that, when modifying the 
word register regs . w . ax, its low- and hi-byte components 
regs . b . ah and regs . b . al are updated and can safely be 
read back. Due to this byte-level aliasing, no partition of the 
memory into scalar cells exists. 



The terms real, scalar, aggregate come from the C norm 1141 . 



struct msgA { int type; int a [2] ; }; 
struct msgB { int type; double x; }; 

union msg { 

struct { int type; } T; 
struct msgA A; 
struct msgB B ; 

}; 

void process (union msg *m) { 
switch (m->T.type) { 
case 0: { 

struct msgA* msga = &(m->A); 

int data = msga->a[0] +1 ; 

/* work on msga */ 

y 

case 1: { 

struct msgB* msgb = &(m->B); 
/* work on msgb */ 

} 

void read_sensor_4 (unsigned* m) { 

/* put Jf. bytes from sensors into m */ 

} 

void main(void) { 

unsigned char buf [sizeof (union msg)]; 
int i ; 

for (i=0;i<sizeof (buf )/4;i++) 

read_sensor_4( (unsigned*)buf +i) ; 
process ( (union msg*)buf ) ; 

} 

Figure 1. Message manipulation example illustrating the 
use of union types. 



void 

memcopy (void* dst , void* src, unsigned sz) { 
unsigned char* s = (unsigned char*) src; 
unsigned char* d = (unsigned char*) dst; 
unsigned i; 

for (i=0;i<sz;i++) d[i] = s [i] ; 

} 

int get (unsigned char* buf) { 
struct { int *p; •■■ } S; 
memcopy(&S, buf+16, sizeof (S)); 
return * (S .p) ; 



Figure 2. User-defined generic memory copy procedure. 



void 

memcopy (void* dst, void* src, unsigned sz) ■[ 
char* s = (char*) src; 
char* d = (char*) dst; 
for (;sz>=8;sz-=8,s+=8,d+=8) 

*((double*)d) = *( (double*) s) ; 
for (;sz!=0;sz — ,s++,d++) *d = *s; 



Figure 3. Alternate user-defined memory copy procedure. 



Pointer Arithmetics. Pointer arithmetics encompasses 
array indexing. For instance, given the following declaration: 

struct { int a [3] ; int b; } U, V; 

*(U.a+2) is equivalent to U.a[2]. But pointer arithmetics 
also allows escaping from an array embedded within a larger 
type, breaking standard out-of-bound array analyses. For in- 
stance, *(U.a+3) can safely be considered equivalent to U.b 
for most compilers. No assumption can generally be made, 
however, on the relative position of U and V in memory: 
U.a[4] is considered a run-time error and U.a+4 points to 
an unspecified location outside U — generally not within V. 

Pointer Casts. Pointer casts allow considering any part 
of the memory as having any type. Consider the main func- 
tion in Fig. [1] It declares buf as an array of unsigned char 
but actually uses it both as a reference to an unsigned 
int (when calling read_sensor_4) and as a message of type 
union msg (when calling process). This achieves the same 
effect as a union type, except that the set of possible cell 
layouts is no longer embedded within the static type of the 
variable. It must be guessed dynamically. An extreme illus- 
tration of this problem is given by the generic memory copy 
functions memcopy of Fig. [2] (a portable, one-byte-at-a-time 
version) and Fig. [3] (an optimized version that copies by 
bunches of eight bytes, inspired from actual PowerPC soft- 
ware). There, the void* type is used to achieved polymor- 
phism. This effectively discards all type information that 
would hint at the structure of the memory from src to 
src+sz-1. Despite this lack of typing information, we must 
be able to copy multi-byte cells from src to dst in a way 
consistent with their type. In order to treat precisely the 
indirect addressing *(S.p) at the end of the get function in 
Fig. it is paramount to copy "as-is" the pointer value hid- 
den at offset 16 in buf. We refer the reader to Siff et al. [21] 
for more examples of type casts used in real-life C programs. 

3. Overview of the Analysis 

In this section, we only try to present the gist of our analysis 
in an informal way. The next section will be devoted to its 
precise, mathematical definition. 

3.1 Assumptions 

Limitations. Our analysis computes, for each control 
state, an overapproximation of the reachable memory states, 
where a control state is given by a program point together 
with a call-stack. For the sake of simplicity, we place our- 
selves in the context of a fully context-sensitive analysis on 
code without recursive procedure nor dynamically memory 
allocation. Our main hypothesis is that the set V c of vari- 
ables whose contents define the memory state — global and 
local variables from all stack frames — is a static function of 
the control state c only. In practice, it is valid when ana- 
lyzing embedded C code (where malloc and recursion are 
prohibited) with a high level of precision (requiring context- 
sensitivity) . However, we believe that these limitations may 
be overcome using summarization techniques which are or- 
thogonal to our purpose — e.g., heap abstraction as in |2U] , 
array summarization [10], or procedure summarization [24] . 

Application Binary Interface. In order to achieve a 
high-level of description and discourage unportable prac- 
tices, the C norm [14] under-specifies many parts of the lan- 
guage. In particular, the exact encoding of scalar types as 
well as the layout of fields in structures are mostly left to the 



implementor. However, in order to ensure the interoperabil- 
ity of compiled programs, libraries, and operating systems, 
the precise representation of types is standardized in so- 
called implementation-specific Application Binary Interfaces 
(or ABI) such as [T]. Although it is possible to write fully 
portable, ABI-neutral C code, our purpose here is the anal- 
ysis of C programs that make explicit use of architecture- 
dependent features — such as embedded programs that need 
to be efficient and have a low-level access to the system. 
Thus, our analysis is parameterized by ABI functions, such 
as sizeof : V c — > N that gives the byte-size of each variable. 

Input Language. We suppose that each C function has 
been processed into a control-flow graph where basic blocks 
are either assignment or guard instructions involving only 
side-effect free expressions. Moreover, using our knowledge 
of the ABI, all pointer arithmetics has been broken down 
to the byte-level. Except for the purpose of dereferencing, 
all pointers can be assumed to be pointers to unsigned 
char. All memory reads and writes are performed through 
pointer dereferencing. We assume that these involve only 
scalar types (i.e., integers, floating-points, and pointers). 
Likewise, field selection . and -> in struct and union, as 
well as array indexes [ ] have been converted into byte-level 
pointer arithmetics and dereferences of values of scalar type. 
As these are usual static simplifications performed by most 
compilers and analyzer front-ends, we do not present them 
in more details. Constructs that do not fit in this simpli- 
fied framework (such as function pointers or assignment of 
compound values) will be dealt with in Sect. [4] 

Numerical Analysis Parameter. Our analysis is param- 
eterized by a standard numerical analysis. Following the Ab- 
stract Interpretation framework [5], we suppose that it is 
given in the form of a numerical abstract domain, i.e., an 
abstract representation of invariants together with abstract 
transfer functions to mimic, in the abstract, the effect of 
instructions and control-flow joins. In theory, such an anal- 
ysis outputs an invariant I c on V c at each program point c. 
However, it supposes that variables are unaliased and have 
real type (i.e., integer or floating-point), which is not the 
case for V c . Thus, we do not use the numerical domain di- 
rectly on V c but on some collection C c of synthetic cells of 
real type. We provide an abstraction of the memory layout 
that drives the numerical analysis by dynamically managing 
C c , translating instructions over V c into instructions over C c , 
and taking care of byte-level aliasing between cells in Cc. 

3.2 Abstract Memory Layout 

Each variable V is viewed as an unstructured sequence of 
sizeof (V) contiguous bytes. Its layout in C c is initially 
empty. It will be populated with possibly overlapping cells of 
scalar type as V is accessed. Abstracting a program instruc- 
tion is done in three steps. First, we enrich the layout by 
adding all cells targeted by a dereference in the instruction. 
Secondly, we evaluate, in the numerical domain, the instruc- 
tion where all dereferences have been replaced with cells. 
Thirdly, we remove all cells invalidated by alias-induced side- 
effects. When a layout C c is changed, the corresponding cells 
are created or deleted in the numerical invariant I c . 

We illustrate this mechanism on the example of Fig. [4] 
Fig. [S] gives the abstract memory layouts Ci to C7 of the 
variable regs at program points (1) to (7). 

• The assignment regs . w . ax = X first creates a new cell, 
named ax, of type uintl6, occupying offsets and 1 in 
the variable regs — see the top of Fig. [5] Supposing that 



static union { 

struct { uint8 al,ah,bl,bh, . . . } b; 

struct { uintl6 ax,bx, ... }■ w; 
} regs; 

regs.w.ax = X; (1) 

if ( ! regs . b . ah) (2) regs . b . bl = regs . b . al ; (3) 
else (Jf.) regs . b . bh = regs . b . al ; (5) 
(6) regs.b.al = X; (7) 

Figure 4. Register state of an Intel 8086 processor and 
sample code to manipulate it. (1) to (7) represent program 
points of interest — see Fig. \5\ 
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Figure 5. Memory layouts Ci to CV for the variable regs 
at program points (1) to (7) when analyzing Fig. [4] 



X corresponds to cell X, the assignment is then evaluated 
as ax <— X in the numerical domain to yield 1%. 

• Before executing the test ! regs . b . ah, the cell ah of type 
uint8 is created at offset 1 in regs. If the ABI tells 
us that the computer uses a little-endian byte-ordering, 
the cell can be initialized using the constraint ah = 
ax/256 on 1%. Finally, the test is executed by adding 
the constraint ah = 0. When backed by a sufficiently 
powerful numerical domain, we may be able to infer that 
X/256 = ax/256 = ah = 0,' i.e., X € [0, 255], at (2). 

• Let us now consider the control-flow join following the 
conditional branches (3) and (5). As C3 7^ C5, we must 
unify cell sets. This is done by adding missing cells: bh is 
added to C3 and bl to C5. As regs is declared static, its 
bytes are initialized to — according to the C norm [14] . 
Thus, we add the constraint bh = to ^3 and bl = to 
I5. The control- flow join is then performed safely in the 
numerical domain, using the cell set Cf, — C3 U C5. 



• The last assignment, regs.b.al = X, can be directly 
evaluated as al <— X in the numerical domain because 
all the cells involved exist. However, modifying al also 
modifies ax, a fact the numerical domain is not aware 
of. We correct the invariant I7 by deleting, after the 
assignment, all cells that overlap the modified cells. Thus, 
ax ^ C7. Note that ax will be back when regs.w.ax is 
accessed next and its contents will be synthesized using 
fresh information from the overlapping cells ah and al. 

• In the presence of loops, we iterate the abstract compu- 
tation until it stabilizes. Numerical domains usually use 
special joint-like binary operators v", so called widenings 
[5], to accelerate fixpoint computations. As for control- 
flow joints, we first unify the cells layouts of the argu- 
ments, and then apply V* in the numerical domain. 

3.3 Pointer Abstraction 

Numerical domains can only abstract directly environments 
over cells of real type, not pointer type. Thankfully, a pointer 
value can be viewed as a pair (V, o) £ V c x N, which 
represents the offset o, in bytes, from the beginning of the 
variable V . We abstract each component independently: one 
cell of integer type is allocated in the numerical domain for 
each pointer cell to represent its possible offsets while we 
maintain, in the memory layout, a map associating a set 
of base variables to each pointer cell. Pointer arithmetics 
in expressions are straightforwardly translated into integer 
arithmetics on offsets and then fed to the numerical domain. 
One benefit of this is that we are able to find relationship 
between pointer and integer values. For instance, when using 
the polyhedron domain, we can infer that s=src+sz holds at 
the end of the memory copy procedure of Fig. [3] 

3.4 Intersection Semantics 

As it will become clear in Sect. 14.4.31 we actually use an 
intersection semantics for overlapping cells. Suppose, for 
instance, that the analysis of Fig. [4] found some numerical 
invariant I c with respect to the layout C c = {ah, ax}. Then, 
taking byte-level aliasing into account, regs. b. ah = v is 
possible only if both ax's hi-byte and ah can be v: I c 
actually stands for I c A (ah = ax/256). This semantics 
ensures that, when reading a cell's contents, it is safe to 
ignore overlapping cells — we simply lose some constraints, 
which is sound. In practice, when there already exists a cell 
in C c with the correct type and offset — which is the most 
common case — we use it without looking at overlapping 
cells while, when it needs to be created, we use existing 
overlapping cells to synthesize good and safe initial values. 
Dually, when writing a cell's contents, we must take care 
to update all overlapping cells as they give constraints that 
were true before the assignment but are no longer valid. In 
practice, we destroy such cells, which actually delegates the 
update to the next time the cell is created back by a read. 

Another legitimate choice would have been a union se- 
mantics. However, this would have made write cheap and 
read costly. We favored the cheap reads of the intersection 
semantics. Also, as we will see in Sect. 14.4131 the intersection 
semantics has a very natural formalization. 

For performance reasons in the numerical domain, we 
should avoid creating too many cells. Our scheme keeps sev- 
eral redundant cells per memory byte. Thankfully, redun- 
dancy is bounded by the low number of scalar types — 13 as 
shown in Fig. [6] As cells are created lazily and destroyed 
often, there is few long-lived redundancy in practice. More- 
over, by associating only one offset variable per pointer — and 



not one for each basis variable the pointer can point to — we 
lose some precision but avoid a potential quadratic blow-up. 

4. Formalization of the Analysis 

4.1 Abstract Interpretation 

We formalize our analysis in the Abstract Interpretation 
framework, a general theory of the approximation of pro- 
gram semantics introduced by Cousot and Cousot in [5]. 
It allows the systematic design of static analyses with var- 
ious levels of precision. The gist of the method is first to 
design a concrete semantics, the most precise mathemat- 
ical expression of program behaviors. This step emphases 
on expressibility only and generally results in a non com- 
putable semantics. Then, sound abstractions are performed 
and composed until a computable semantics is derived from 
the concrete one. This results in an abstract interpreter that 
can be run without user intervention, terminates on all pro- 
grams, and is, by construction, sound with respects to the 
concrete semantics. It is, however, often incomplete. Ab- 
stractions should be carefully chosen based on the class of 
properties to be checked, the class of programs analyzed, and 
the amount of resources to be invested in the static analysis. 

There exists a large library of abstract domains that 
provide ready-to-use abstract computation algorithms. For- 
mally, given a concrete universe D where the concrete se- 
mantics is formalized, an abstract domain is given by: 

• a set Tr of computer-representable abstract properties, 

• a concretization function 7 : — * V(T>) assigning a 
meaning to each abstract property, 

• a computable partial order c" on 2?" such that 7 is mono- 
tonic: X s C 6 Y s => -y(X s ) C 7(y"); it models the rel- 
ative precision of abstract elements and enables fixpoint 
abstractions through iteration schemes — potentially ac- 
celerated using special extrapolation operators V , 

• for each n— ary semantical operator F : T> n — > V(T>), 
an abstract, computable version F* : P*™ — > X>" that is 
sound, i.e., VA» e V\ VX 4 € -y(Xf), F(X u ...,X n ) C 
(7oF»)(I»,...,4). 

We refer the reader to [5] [6] for more informations on the 
theory of Abstract Interpretation and its applications. 

4.2 Language 

We suppose that the program has been preprocessed into the 
simple language of Fig. [6] Each type denotes not only a set of 
possible values, but also their bit-representation in memory. 
We assume that all pointers use the same representation, 
and so, use a single type denoted by ptr. In expressions, 
V c and T denote respectively the set of variables at control 
point c and functions. We have distinguished two kinds of 
assignments: assignments of expressions of scalar type, and 
copy assignments of arbitrary data structures. 

4.3 Concrete Memory Domain T>m 

We now introduce a non-standard, low-level semantics T>m 
that gives a meaning to the programs of Sect. 

4.3.1 Concrete Memory Representation 

Values. Let us denote by V T the set of values of scalar 
type t. For real types, V T is a finite subset of R. Pointer 
values range in the following set: 

Vptr = {(V,i) \V € VcUJF, < i < sizeof (V) } U {0,^} 



int-sign 

int-type 

float-type 

real-type 

scalar-type 

type 



expr 



unsigned | signed 
char I short | int | long | long 
float I double | long double 

int-sign int-type \ float-type 
real- type | ptr 
scalar-type 
type [71] n £ N 

struct { Id± : type, . . . , Idn : type } 
union { Idi : type, . . . , Idn ■ type } 



long 



est 
kV 

o expr 
expro expr 
* T expr 
(r) expr 



est £ R 

oe{-,~,!} 

o G {+,<,&, II,. 
r 6 scalar-type 
t 6 scalar-type 



inst ::= 



* T expr <— expr r £ scalar-type 
* T expr <— * T expr r £ type 
expr == ? 



Figure 6. Language Syntax. 



where is the NULL pointer while ui represents all erroneous 
pointer values. Valid pointers pointers are (base,offset) pairs. 
Following the C norm, data pointers can point one byte past 
the end of a variable. To treat function pointers the same 
way as data pointers, it is sufficient to extend sizeof so 
that sizeof (V) = when V £ T: valid functions pointers 
always have a null offset. 

Memory State. We decompose the memory into a collec- 
tion B of untyped byte locations: 

B(V C ) = { (V,i) I V € Vc, < % < sizeof {V) } 

The set of values a byte can hold is defined as the following 
set V of triples: 

V {(r, b, v)\t € scalar-type, < b < sizeof (r),v 6 V T } 

where (r, b, v) represents symbolically the b— th byte of the 
representation of the value v of scalar type r. A concrete 
memory state associates a byte value to each byte location: 
©m(Vc) = B(V C ) — > V. Note that, in actual computers, the 
memory maps byte locations to numbers within [0, 255]. Our 
memory representation is sightly higher-level as it abstracts 
away the encoding from V to [0, 255]. For instance, the base 
variable of pointers is kept symbolic so that our semantics 
is independent from the absolute address chosen for the 
variables by memory allocation services. 

Value Recomposition. Due to pointer casts, a sequence 
of bytes may be dereferenced as a value of any type. Thus, we 
now suppose that we are given a family of functions 4> T that 
construct all the values of type r G scalar-type correspond- 
ing to a given byte sequence: 4> T : V sizeofM -> P(V T ). Note 
that, to allow a conservative modeling of casting, the func- 
tions may output several values. The exact definition of <j> is 
highly dependent upon the ABI. We provide, in Fig. [7] an ex- 
ample definition valid for Intel x86 processors. It embeds use- 
ful information, such as the fact that is always represented 
as the integer 0, or that integers are represented using two's 
complement arithmetics and little endian byte ordering — it 
models precisely the regs variable in Fig. [4] However, when 
the value depends upon information abstracted away by our 



<t>r((To,b ,V ), ■ ■ ■} = {v} if Vfc, V k — V, Tfc = t, b k = k 
^unsigned char((l~, 6, v)) = 

{{0} if r = ptr and u = 

{iV(256 6 ) mod 256} if r G mi- type 
[0, 255] otherwise 

^unsigned t{x , ■■■} = { J2 k * ^ I 

£ ^unsigned char 

Signed t(x) = { W | W + 2 SiZSOf (t) Z n ^unsigned t(x) / 0, 

w G j_2 sizeof — 1 2 sizeof } 

/ / i dot I {0} if (/"unsigned long (a;) = 

- I Vptr otherwise 
in all other cases, <j> T {x) == V T 

Figure 7. Value recomposition function example. 



semantics {e.g., non-0 pointers cannot be converted to in- 
tegers without knowing the absolute address of variables) 
or when we are not interested in the precise behavior of a 
particular construction [e.g., reading the binary represen- 
tation of floating-point values) we use a conservative defi- 
nition: 4>t{x) = V T . If need be, these cases can be refined. 
Dually, we may trade precision for generality — e.g., drop our 
assumption on the byte ordering of integers. 

4.3.2 Concrete Semantics 

Expression Semantics. The concrete semantics [e] : 
2?m(V c ) — > V(y T ) °f an expression e of type r G scalar-type 
associates a set of values to a memory state. Most of its 
definition can be readily extracted from the C norm [14| 
and the IEEE 754-1985 norm [13]. We present here only the 
part related to our non-standard definition of the memory. 
It corresponds to the semantics of pointers and dereferences: 

. 1&V}(M) d = f {(V,0)} 

• |e + e'](M) d = { (V,i + j) \ <i+ j < sizeof (V), 

(V,t)6[e](A0, ie[e'](M)} 

. [e-e'](M) = {i-j\ 

(V,t)6[e](A0, (V,j)€le'}(M)} 

.,( P «, e ,(M)g{W ™ £,0 ' a> 

• [* T e](M) d = f 

U { <f> r {M(V,i),...,M(V,i + sizeof (r)-l)) \ 
(V,i) G [e](M), i + sizeof (r) < sizeof (V), 
i = [alignof (r)] } 

As before, the non-determinism allows a lose but sound 
modeling of concrete actions. Erroneous computations (such 
as overflows in pointer arithmetics and out-of-bound or 
misaligned pointers in dereferences) halt the program, and 
so, do not contribute to the set of accessible states. 

Instruction Semantics. The semantics | i |f : 2?jy/(V c ) — > 
V(T>m(V c )) of an instruction i maps a memory state before 
the instruction to a set of possible memory states after the 
instruction. It is defined as follows: 

• tests filter out environments that cannot satisfy the test: 



• copy assignments perform a byte-per-byte copy: 

{* T e<-* r f\{M) = { 
M[(V,i) ^ M(W,j),...,(V,i + n) f— > M(W,j + n)] j 
(V,i) G [e](A0, (W,j) G [/](M), n = sizeof (r) - 1, 
i + n < sizeof (V), j + n < sizeof (W) } 

• regular assignments evaluate the right-hand expression 
and store its byte components into the memory: 

\* T e^f\{M) = 

{ M[(V, i) ~ (r, 0, «),... , (V, i + n) ^ (r, n, v)) \ 
(V,i)e[e](M), «6[/](M), 
n = sizeof (r) — 1, i + n < sizeof (V) } 

Note how the value conversion <f> due to pointer casts only 
occurs at memory reads, i.e., in a lazy way, so as to reduce 
the precision loss. Most of the time, we fall in the first case 
of Fig. [7] we read back a byte sequence corresponding to a 
value v stored by a previous assignment of matching type; 
4> returns the singleton {v} and there is no loss of precision. 

When r is scalar, * T e <— * T e' can be considered as either 
type of assignments, but the copy assignment form is more 
precise because it avoids interpreting the memory contents 
via (j>. This allows the precise modeling of the polymorphic 
memory copy functions of Figs. [2H3]as byte-per-byte copies. 

Variable Creation and Destruction. When creating a 
new (zero-initialized) variable V of type r, new byte loca- 
tions initialized to the value (unsigned char, 0, 0) are added 
to B. However, deleting a variable V from a memory state 
M is more complex. We must not only remove some byte 
locations from B, but also invalidate pointers to V in the re- 
maining locations, which gives the following memory state: 



(W,i) 



(ptr,j» if M(W,i) = (ptr,j,(V,-)) 
M(W,i) ot her wise 



4.4 Memory Abstractions 

We now present computable abstractions of the concrete 
memory domain. We are able to retrieve, in Sect. 14.431 the 
analysis presented in Sect. [3] in a sound and formal way. We 
also present, in Sect. 14.4.41 a memory equality abstraction 
to improve its precision in the presence of copy assignments. 

4.4.1 Scalar Value Abstraction 

We suppose that we are given a numerical abstract do- 
main X>g(iV) able to abstract environments over a set iV 
of cells with real type. That is, its concretization 7u lives 
in Dg(iV) — » and it features assignment and test trans- 
fer functions on expressions involving only real- valued con- 
stants, cells in N, and arithmetic operators. We refer the 
reader to [5l [16] for example definitions, including support 
for relational invariants and floating-point arithmetics. 

Our first task is to add support for pointer values V ptr to 
2?g(JV). As explained in Sect. 1331 the base component of a 
pointer is abstracted as a set of variables or functions while 
its offset is assigned a dimension in the numerical domain. 
Given a collection C of cells of scalar type, the enhanced 
domain Z?y(C) is constructed as follows: 

o«(C) = f n»(c) x (c ptr ^p(v c uK0})) 

where C ptr is the subset of C with pointer type. A pair 
(N, P) G X>v(C) represents the set jf((N,P)) of environ- 
ments p : C — » U T (V T ) such that, for some a G 7r(-/V): if V 



has real type, then p(V) = er(V) ; if V is a pointer, then either 
p(V) G P(V) n {«, 0} or p(V) e \ {w, 0}) x {<r(V)}. 

At the level of D^, we accept the same expressions as 
in "D" A with the addition of pointer arithmetics — excluding 
pointer dereferencing. As pointer arithmetics has been bro- 
ken down to the byte level, we can feed any instruction di- 
rectly to 23g and obtain its effect on the offset information. 
The effect on pointer bases is derived by structural induction 
on expressions. For instance, if p, q and i are respectively 
two pointers and an integer variable, then the assignment 
\ q <— p + i \\{N, P) in T)\ will return the abstract pair 
(•J q <— p + i |)-g(iV), P[q i — > P(p)]) stating that q now points 
to the same base variables as p, and its offset is that of p 
plus i. The binary abstract operators — such as union \JL 
and ordering Cy — are defined point-wisely. These are quite 
unoriginal, and so, we do not detail them further. 

4.4.2 Offset Abstraction 

In practice, is not a single numerical domain but a 
reduced product of several domains specifically chosen to fit 
the kinds of invariants found in an application domain — in 
our case, reactive control-command software, this includes 
plain intervals [5], relational octagons [15], and domain- 
specific filter domains [9]. Now that we rely on 23^ to also 
abstract pointer offsets, new kinds of numerical invariants 
are needed and we must enrich our product. An important 
property to infer is pointer alignment, such as p = [4] when 
p is used to access elements of byte-size 4 in an array. For 
this, we use the simple congruence domain [111 [2], 

Although the combination of intervals and congruences 
seems sufficient in most cases, preliminary experiments sug- 
gest the need to infer invariants of the more general form p G 

. [ttj, bj] x Ci to represent, e.g., slices in multi-dimensional 
arrays. No such domain exists; its construction is left as fu- 
ture work. Alternate ideas include using the reduced product 
of linear equalities and intervals, as done by Venet [23] . 

4.4.3 Cell-Based Memory Abstraction 

Cell Universe. In order to use the value domain X> v , we 
need to map memory bytes in T>m{V c ) to cells of scalar type. 
Given p G T>m{Vc), for each binding p(V,i) = (r,b,v), we 
must consider a cell of type r at offset i — b in variable V , 
with value v 6 V T . We define the following cell universe: 

Caii(Vc) = { (V, i, t) I V G V c , t G scalar-type, 

i > 0, i + sizeof (r) < sizeof (V) } 

where (V, i, r) corresponds to a cell of type r starting at off- 
set i in variable V . It models bytes at locations (V, i + b) 
for all b in [0, sizeof (r) — 1]. We will say that two cells 
overlap when the byte locations they model overlap. When 
extracting cells from a concrete state, we can encounter over- 
lapping cells — e.g., (regs, 0, uintl6) and (regs, 1, uint8) at 
program point (2) in Fig. [4] As an abstract memory state 
is supposed to represent a set of concrete states, we must 
consider a fortiori overlapping cells to accurately model all 
possible memory structures. 

Abstract States. An abstract memory state, in is 
given by a subset C of the cell universe, together with an 
abstract element in X>J(C) giving the cell contents: 

^m(Vc) = { (C,X) | C C Can(Vc), x e vi(c) } 



A pair represents the following set of memory states: 

jU{(Vi,ii,Ti),...,(V n ,in,r n ),X) d = f { P ev M (v c )\ 
Vpi G (f>n (p(Vi, h), . .., (Vi,ii + sizeof (n) - 1)), 

Va;„ G T „ (p(V n ,i n ), ■ ■ ■ , (V n ,i n + sizeof (t„) - 1)), 
(an, . . . , x n ) G jv(X) } 

Note the universal quantifiers which mean that, when two 
cells from C overlap at a byte location (V,i), 7m(C, X) selects 
only concrete environments whose byte values at (V, i) are 
compatible with both cell values from yy(X). Hence the term 
intersection semantics used in Sect. 13.41 Moreover, when a 
byte location is not covered by any cell in C, it can take any 
value in the concrete world. 

Cell Realization. It would be conceptually simpler to 
always consider C = C a n, but quite costly as the time and 
memory complexity of D v depends directly on the size of 
C. Thus, C is chosen dynamically. As yjj has universal 
quantifiers, it is always safe to remove any cell c from C: 
lli{C,X) C ylf(C \ {c},A| ?v{c) ). Adding a new cell c is 
more complex: we must initialize its value according to 
existing cells overlapping c so as not to forget any concrete 
state. We call this operation cell realization. First, the cell 
is created and initialized to V T , which is sound. Then, the 
value is refined by scanning the set of overlapping cells for 
certain patterns and applying tests transfer functions in T>^ 
accordingly. For instance, when trying to add the cell ah 
in the cell set Ci of Fig. [5] one finds the overlapping cell 
ax. According to the ^signed char function of Fig. [7] we 
can apply the transfer function -J ax/256 — ah == 0? J-t. 
Note that, if T>^ contains relational domains, the relationship 
between the realized and the overlapping cells will be kept. 
For instance, if T>^ is able to represent the invariant ah = 
ax/256, then ,whenever we learn something new on the value 
of one cell, it will be immediately reflected on the other one. 

Abstract Operators. Assignments and tests are trans- 
formed by replacing dereferences with cell sets, and then 
fed to the underlying value domain. Given a sub-expression 
* T e, where e is dereference- free, e is first evaluated in T>^ 
which returns the set S of byte locations it can point to. All 
cells C' — { (V,i,r), \ (V,i) G S } are then realized in the 
current abstract state — if not already there. The resolution 
continues with the enriched abstract state for the expres- 
sion where * T e has been replaced with the cell set C' . Tests 
can be directly executed in T>\ on the resulting expressions. 
Assignments are a little more complex because they involve 
memory writes. Given an assigned cell c, we first realize c, 
then execute the assignment in T>y, and finally remove all 
cells overlapping c. Note that a dereference may resolve in 
more than one cell, \C'\ > 1, which results in weak updates 
m V\. We now define the abstraction of a binary oper- 
ator o. Given the states Si — (Ci,Xi) and S2 = (C2,X2), 
we first unify the cell sets using realization to obtain two 
states S'i = (Ci L)C 2 ,X' 1 ) and S2 = (Ci U C 2 , X' 2 ). We then 
apply the binary operator on the underlying value domain 
and get Si ojjj S 2 = (Ci U C 2 , X[ o\ X' 2 ). This is sound with 
respect to overlapping cells. However, because overlapping 
cells have an intersection semantics, we may lose some preci- 
sion on — informally, we over-approximate (on6)U(cnd) 
as (a U c) n (6 U d). The widening S/* M stabilizes invariants 
by first stabilizing the cell set — which is an increasing sub- 
set of the finite set Can — and then relies on the underlying 



widening V v . The abstract order \Z* M is denned as C v after 
cell sets have been unified to Ci U C2. 

There are strong similarities between the abstract cell 
realization and the concrete value recomposition (f>. Both 
are used, in a lazy way, to reconstruct information when the 
type of a dereference mismatches that of the currently stored 
value. Both are defined according to an ABI and the level 
of modeling required by the user. Both may result in a loss 
of precision. Thus, once a cell is realized, we try to keep it 
around as long as possible (i.e., until it is invalidated by a 
memory write). 

4.4.4 Memory Equality Predicate Domain X^ Eq 

When analyzing generic memory copy functions, V M some- 
times lacks the required precision. Consider, for instance, 
calling the function memcopy (&a,&b,4) from Fig. [2] a and b 
being 4-byte integers. Although it is equivalent to the plain 
assignment a=b, it is carried-out one byte at a time. T>fj will 
first realize individual bytes in b as char cells, copy them into 
a and, the first time a is read, realize back the four char cells 
as a single integer cell. Because each realization may result 
in some loss of precision, the inferred value set for the cell 
(a, 0, int) may be much larger than that of (b, 0, int). 

In order to solve this problem, we introduce a specific 
abstraction X>' Eq of Dm that tracks equalities between byte 
values in a symbolic way: 

£>M q (V c ) = V c — > ((N x V c x N x N) U {T" Eq }) 

where a binding V 1— » (s, W, d, I) means that the I bytes 
starting at location (V, s) are equal to those starting at 
location (W, d), while T' Eq means "no information:" 

7M q M = { P 6 *MVc) I W £ V c , e(V) = (s,W,d,l) 
V0 < i < I, p(V, s + i) = p(W, d + i)} 

Note that only one predicate is kept per variable, and the 
parameters (s, W, d, I) are bound to concrete values. This en- 
sures efficient transfer functions but requires memory copy 
loops to be fully unrolled. (We could benefit from more 
complex predicate abstraction schemes to overcome this 
restriction — e.g., use [4] to keep (s,d,l) symbolic and relate 
their value in D^(N). This was not required in our experi- 
ence as the codes we analyze only copy small structures.) 

Among instructions, only copy assignments are treated 
precisely: tests are safely ignored while other assignments are 
dealt with by removing bindings involving the destination — 
i.e., setting them to T* Eq . Suppose that e(V) = (s,W,d,l) 
and we copy I' bytes from (V, s') to (W',d'); several cases 
arise. When W = W, s - d = s' - d' and s' £ [s,s + I], 
we copy bytes at the end of equal zones. We thus grow the 
zones by setting e(V) = (s, W, d, max(l, I' — s' + s)). The 
case is similar when bytes are copied at the start of zones: 
W = W , s — d = s' —d! and s £ [s' , s' + l']. In all other cases, 
the former binding is useless and we replace it by a new one 
e(V) = (s' , W' , d! , I'). As W' is modified, we must also, in all 
cases, remove any other binding involving W' . We say that 
£1 Em" 5 £2 whenever, for every V, either €2(V) = T* Eq or 
ei(V) corresponds to a sub-range of £2<V). This order has a 
least upper bound, which serves to define the abstract union, 
but no greatest lower bound. As X^ Eq has a finite height, no 
widening is necessary to help the iterates converge. 

We perform a partially reduced product between Tr^ 
and £ > $ l Eq . All abstract operations are performed in parallel. 
In addition, we propagate information from 23^ q to T>^ 
after each copy assignment. For each cell (V, o, r), if we just 
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Figure 8. Regression tests for Astree. 



discovered that e(V) = (s,W,d,l) and [0,0 + sizeof(r)] C 
[s, s + l], then we realize the cell (W, o — s + d, r) and perform 
the assignment * r (kW + o — s + d) <— * T (fcV + o) in T)^. 
In our example, memcopy (&a,fcb, 4), we would generate the 
assignment a^b just after copying the 4— th byte. Thus, the 
value for the cell (a, 0, int) is precisely that of (b, 0, int) and 
no longer need to be realized from the value of char cells. 

5. Experiments 

5.1 Presentation of the Astree Analyzer 

Scope. The goal of Astree is to detect statically all run- 
time errors in embedded reactive software written in C. Run- 
time errors include integer and floating-point arithmetics 
overflows, divisions by zero and array out-of-bound accesses. 
To achieve this goal, Astree performs an abstract reacha- 
bility analysis and computes the set of values each variable 
can take, considering all program executions in all possible 
environments. To be efficient, it performs many sound but 
incomplete abstractions. As a consequence, it always finds 
all run-time errors but may report spurious alarms. Its ab- 
stractions are tuned towards specific classes of programs in 
order to achieve zero false alarms in practice, within rea- 
sonable time. Indeed, [3] reports its success in proving au- 
tomatically the absence of run-time errors in real industrial 
code of several hundred thousand lines, in a few hours. 

Architecture. Astree has a modular architecture. It re- 
lies on a product of several numerical domains, which can 
be plug in and out. They exchange information via config- 
urable reductions. It also features a parameterisable abstract 
iterator tailored for flow- and context-sensitive analysis, and 
trace partitioning to achieve partial path-sensitivity. In pre- 
vious work [3], it has been specialised towards the analysis of 
embedded avionics software by incorporating adapted iter- 
ations strategies and numerical domains (such as relational 
octagons [15] and domain-specific filter domains [9]). How- 
ever, its memory model was limited to simple well-structured 
data only, which was sufficient at that time. In order to an- 
alyze new code featuring union types and pointer casts, we 
replaced it with our new memory abstractions. Thanks to 
the modular construction of Astree and its modular proof 
of correctness, most parts were not tied to the old memory 
abstraction and could be reused (in particular, all numerical 
and partitioning domains, as well as the iterator). 

5.2 Preliminary Experimental Results 

We have run three kinds of experiments: small case stud- 
ies, regression tests and preliminary analyses of new real- 
life software. They all ran on a 64-bit AMD Opteron 250 
(2.4GHz) workstation, using one processor. The analyzed 
programs do not feature recursion, dynamic memory alloca- 
tion, nor multi-threading. Moreover, they are self-contained: 
they do not call precompiled library routines, and the exter- 
nal environment is modeled using volatile variables. 
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Firstly, we tested the relevance of our domains to the 
specific problems of union types and pointer casts discussed 
in Sect. [2] We produced and were given by end-users several 
constructed programs of a hundred lines, in the spirit of 
Figs. [irH] We were able to prove the absence of run-time 
errors of all case studies in a fraction of second. 

Secondly, we re-analyzed the pointer- and union-free in- 
dustrial embedded critical code successfully analyzed by As- 
tree in previous work [3]. Fig. [8] compares the performance 
of the old and new memory domains. We see that the mem- 
ory peak and time consumption are only slightly increased, 
in the worse case, and we find the same alarms. Note that, 
as Astree uses incomplete methods — such as partially re- 
duced products and convergence acceleration — there is no 
theoretical guarantee that our new memory semantics al- 
ways gives more precise results than the former one, even 
though it is more expressive. Hence, the importance of as- 
serting experimentally non-regression in terms of precision. 

Thirdly, we analyzed four new industrial critical embed- 
ded software featuring unions and complex pointer manipu- 
lations. Such codes could not be considered before in Astree 
because of its limited legacy memory domain. The analyses 
results are shown in Fig. [9] These results are preliminary in 
the sense that we have not yet investigated the causes of all 
alarms: they may be due to analyzer inaccuracies, but also to 
real errors or too conservative assumptions on the environ- 
ment. The results are encouraging: they correspond to the 
preliminary results obtained on the codes of Fig. [8] before 
domain-specific numerical domains and iteration strategies 
were incorporated in Astree to achieve zero alarm [3]. 

6. Related Work 

Several dialects of C, such as CCured [17], have been pro- 
posed to prevent error-prone uses of unions and pointers. 
The value analysis of such dialects, with their cleaner mem- 
ory model, would be easier than the full C. Unfortunately, 
their strengthened type systems would reject constructs 
found legitimate by end-users and force them to rewrite their 
software. For now, we (analysis designers) should adapt our 
analyses to the programming features they currently use. 

There exists a very large body of work concerning pointer 
analyses for C — we refer the reader to the very good sur- 
vey by Hind [12]. Unfortunately, they cannot serve our pur- 
pose. All field-insensitive methods natively support union 
and pointer casts — they are considered "no-op." However, in 
order to find precise bounds on values stored into and then 
fetched from memory, we absolutely require field sensitivity. 
Very few field-sensitive analyses support unions or casts. 
Most of them — e.g., the recent work of Whaley and Lam 
[24] — assume a memory model d la Java, where the memory 
can be a priori partitioned into cells of unchanging type. As 
a middle-ground, Yong et al. [27] propose to collapse fields 
upon detecting accesses through pointers whose type mis- 
matches the declared type of the fields. This is not sufficient 
to treat precisely union types — Fig. [l] — or polymorphism — 



Fig. [2] Also, flow-insensitive analyses (such as the union- 
and cast-aware analysis by Stecnsgaard 22 ) which are well- 
suited for program optimization and understanding, would 
not perform precise-enough for value analysis. Indeed, they 
tend to produce large points-to sets — especially given that 
we are field-sensitive — which results in weak updates and 
precision losses in the numerical domains. When it comes to 
program correctness, we are ready to use much more costly 
abstractions: each instruction proved correct automatically 
saves the user an expensive manual proof. 

Instead of relying on the structure of C types, we chose to 
represent the memory as flat sequences of bytes. This allows 
shifting to a representation of pointers as pairs: a symbolic 
base and a numeric offset. It is a common practice — it is 
used, for instance, by Wilson and Lam in |25| . This also sug- 
gests combining the pointer and value analyses into a single 
one — offsets being treated as integer variables. There is ex- 
perimental proof [18] that this is more precise than a pointer 
analysis followed by a value analysis. Some authors rely on 
non-relational abstractions of offsets — e.g. , a reduced prod- 
uct of intervals and congruences [2], or intervals together 
byte-size factors 26] . Others, such as 23, 19 or oursclf, per- 
mit more precise, relational offset abstractions. 

We stress on the fact that using an offset-based pointer 
representation solves, by itself, the problem of points-to 
analysis in the presence of union types and casts, but it does 
not solve the problem of analyzing precisely the contents 
of the memory such offset-based pointers point to. Several 
kinds of solution have been used to avoid treating this second 
problem. A first one is to perform the field-sensitive points-to 
and value analysis of only a part of the memory that is never 
accessed through casts — e.g., the surface structure of [23] — 
while the rest is only checked for in-bound accesses. A second 
one is to fix one memory layout — using, e.g., the declared 
variable types or some pointer alignment constraints — and 
conservatively assume that mismatching dereferences result 
in any value [2]. A less conservative solution, proposed by 
Wilson and Lam [25], is to consider that a dereference can 
output the value of any overlapping cell. We are more pre- 
cise and more general because we allow value recomposition 
form individual bytes of partially overlapping cells and take 
into account the bit-representation of types. In particular, 
unlike previous work, we can analyze precisely the indirect 
dereferencing following the memory copy of Fig. [2] Moreover, 
while [25] often resolves a dereference into several overlap- 
ping cells, even when the target of the dereference is precisely 
known, we manage to select a single cell most of the time. 
This reduces the possibility of weak updates and improves 
the analysis precision, especially when using relational nu- 
merical domains. To our knowledge, our method is the first 
one that allows discovering precise relational invariants in 
the presence of union types and pointer casts. 

Finally, note that most articles — [23] being a notable 
exception — directly leap from a memory model informally 
described in English to the formal description of a static 
analysis. Following the Abstract Interpretation framework, 
we give a full mathematical description of the memory model 
before presenting computable abstractions proved correct 
with respect to the model. 

7. Future Work 

A first goal is to reduce the number of alarms in the newly 
analyzed codes of Fig. [9] In the best scenario, most inaccura- 
cies will be solved by tweaking already existing parameters — 
such as the level of path sensitivity or domain relationality. 



However, we will probably also need to add new numerical 
domains in the reduced product T>^, as it was necessary in 
order to achieve the proofs of absence of run-time errors in 
[3]- We plan to investigate particularly the numerical do- 
mains required to abstract pointer offsets precisely, as it is 
a new requirement of our memory abstractions. Finally, by 
iterating the analyzer refinement process over other codes 
involving unions and pointers, we hope to provide a library 
of abstractions that, in practice, is sufficient to analyze a 
large class of embedded C programs. 

Further goals include incorporating domains for heap- 
allocated objects — e.g., related to predicate-based summa- 
rization as proposed by Sagiv et al. |20II10] . We also wish to 
include other memory abstractions within our framework, 
for instance, the string abstraction by Dor et al. [7] as well 
as generalizations of X^j q using predicate abstractions pa- 
rameterized by numerical domains a la Cousot 0j. 

8. Conclusion 

In this article, we proposed new techniques to perform the 
precise value analysis of C programs with pointers and union 
types. We first gave a precise meaning to such programs by 
defining a concrete memory semantics, parameterized by an 
Application Binary Interface. We then proposed two com- 
putable abstractions: a value abstraction, parameterized by 
the choice of a numerical abstract domain, and an equal- 
ity predicate abstraction, able to precisely deal with poly- 
morphic memory copies. The combined abstractions have 
been implemented within the Astree parametric static an- 
alyzer that checks for run-time errors in embedded critical 
C software. Preliminary experimental results are encourag- 
ing: while not sacrificing the precision and efficiency of As- 
tree on legacy analyses — in particular, the proof of absence 
of run-time errors for some large industrial codes in a few 
hours of computation time — we greatly enlarge the class of 
analyzable programs. Currently, small test cases containing 
pointers and unions have been proved correct while there 
are still a few dozens alarms on real- life industrial examples. 
We are confident that these results will be improved in the 
future by refining the analyzer. 
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